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ABSTRACT 


With an ever growing demand for low-power devices, it is a general trend to 
search for ways to reduce the power consumption of a system. Multipliers are 
an important requirement in applications linked to Digital Signal Processing, 
Communication Systems, Optical Computing, Nanotechnology, Low-Power 
Very Large Scale Integration and Quantum Computing. Conventional 
mathematics makes multiplication a very long and time consuming process. 
The use of Vedic mathematics has led to great reduction in the time required 
for such calculations. The excessive use of Urdhava Tiryakbhyam sutra in 
multiplication surely proves its effectiveness and simplicity in this domain. 
This sutra supports the process of pipelining, a method employed in 
reduction of the power used by a system. Reversible logic has been gaining 
demand due to its low-power capabilities and is currently being used in many 
computing applications. The paper proposes two multiplier systems: 
one design employs the Urdhava Tiryakbhyam sutra along with pipelining 
and the second uses reversible logic gates into the first design. These 


proposed systems provide very less delay for result computation and low 
hardware utilization when compared to non-pipelined Vedic multipliers. 
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1, INTRODUCTION 

The speed of a system mainly depends on the multiplier used and reducing the delay in 
multiplication greatly increases the throughput of the system [1]. During multiplication, a large number of 
partial products are generated. While designing a system, it is always required to reduce the computational 
delay in producing these partial products. To make these multipliers ready for VLSI implementations, it is 
necessary that they have high-speed, consume low power and occupy less area. 

Multiplication consists of 2 steps, first — partial product generation and second — partial product 
accumulation. Different methods adopted for the processing in these steps tend to vary the time taken to 
produce the end result. Vedic mathematics specifies 16 sutras (Formulae) and sub-sutras (Sub-formulae) 
which can be employed in different arithmetic calculations [2]. It is well-known that with the techniques laid 
out in Vedic mathematics, the calculations are got at a faster rate than conventional mathematical techniques 
[3]. The Urdhava Tiryakbhyam (UT) sutra is the most commonly used Vedic formula for multiplication. It 
employs vertical and crosswise multiplication of the digits which are being multiplied [4]. This sutra supports 
the use of pipelining to sum the partial products. The Vedic multiplier thus consists of a set of partial product 
generators whose outputs are summed up by adders which work in parallel. 

Reversible logic gates are used in circuits for applications in low power consumption [5]. 
These circuits dissipate zero heat under ideal physical circumstances, as they do not erase information. 
There is a one to one mapping between the input and output vectors; that is its input and output can be 
retrieved uniquely [6, 7]. 
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The two main drawbacks of multipliers are its high delay in generating products and excessive 
power consumption [8]. This paper proposes a multiplier system combining the technique of Vedic 
multipliers and use of reversible gates. The algorithm of the system is coded in Verilog hardware description 
language (HDL). Reversible circuits and Vedic multipliers can be designed efficiently using HDL [9, 10]. 
The proposed multiplier system is implemented on Spartan 3E series of FPGA. It is understood from the 
results that the combined use of UT sutra and Reversible gates has led to increase in the speed and reduction 
in the power consumption of the system. The UT sutra requires less number of steps to calculate the product 
of multiplication and its parallel computation further assists the reduction in the time for output generation 
[11]. Reversible logic implemented using the Toffoli gate requires low power during its operation [12]. 
The proposed multiplier system delivers results faster and consumes less power and thus overcomes the two 
drawbacks of common multipliers. 

This paper is organized as follows: Section 2 contains an overview of pipelined Vedic multipliers, 
reversible logic and the proposed multiplier system. Section 3 highlights the results obtained by the proposed 
system. Section 4 concludes the work. 


2. PROPOSED SYSTEM 
2.1. Pipelined Vedic Multipliers 

Calculations employing the techniques from Vedic mathematics reduces the complexity of the 
working of a multiplier. This greatly decreases the computational time and outputs are attained faster with 
lesser delays. UT sutra is the technique implemented in the proposed multipliers. The method of vertical and 
cross-wise multiplication adopted by this sutra for two 2-bit binary numbers 1s as described in Figure 1. 

As can be seen, Step | produces the vertical product 1 which can be represented using Pl. Step 2 
produces two sub-partial binary products | and 1, by cross-wise multiplication, & their sum results in binary 
10. This is represented by 2-bit register P2(1:0), where P2(O) is the LSB and P2(1) is the MSB or carry. 
The sub-partial binary product 1 got in Step 3 1s added to the carry of P2, represented by P2(1). 

This results in a 4-bit binary number 1001 and which is written as 


P3(1:0) P2(0) Pl 


Where P3(1:0) = 10, 
P2(0) = O and 
Pl=1. 
Product of multiplication of two 2-bit binary numbers 11 and 11 is 1001. 





Figure 1. UT sutra for two 2-bit binary numbers 


Let us consider the case where two 4-bit binary numbers 1111 and 1111 are multiplied using UT sutra. 


Step 1 Step 2 Step 3 
1 1 1 1 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 1 1 1 1 1 
1 (1+1) (1+1+41) 
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Step 4 Step 5 Step 6 
1 1 1 1 Sh 1 1 1 1 1 
1 1 1 1 r 1 1 61 1 1 1 1 
(1+14+14+1) (1+1+41) (1+1) 
Step 7 
1 1 1 1 
1 1 1 1 


The final result of the multiplication 1s got when the carry of the previous partial products are 
summed together with the next partial products [11-14]. This 1s as described as follows: 


Step 1: Pl=1.1=1 
Step 2: P2(1:0) = 1.1+1.1=10 P2(1) = 1 & P2(0) =0 

Step 3: P3(2:0) = 1.14 1.14 1.1+ P2(1)=1+14+1+1= 100 P3(2) = 1, P3(1) = 0 & P30) =0 
Step 4: P4(2:0) = 1.1 + 1.14 1.14+1.14+P3(1)=1+14+1+1+0=100 Fa) laa) Oar) 0 
Step 5: P5(2:0) = 1.1 + 1.1 +1.1 + P3(2) + P4(1) =14+14+14+1+0=100 oa eo. 
Step 6: P6(1:0) = 1.1 + 1.1 + P4(2) + PS(1) =14+14+1+4+0=11 heey ae 

Step 7: P7(1:0) = 1.1 + P5(2) + P6(1)=14+1+1=11 





As is seen, the LSB of a step is maintained and all the higher bits are considered as carry and are 
taken to the next consecutive steps. For example, 100 is the result got in Step 3. The LSB O is maintained and 
MSBs (carry) 1 & O are taken to the next two steps, 1 goes to Step 5 and O goes to Step 4. In Step 7, the 2-bit 
result of 11 1s maintained and so the final result is an 8-bit binary number: 


{P7(1:0)P6(0)P5(0)P4(0)P3(0)P2(0)P1(0)} = 11100001 


It can be seen here how the technique of UT sutra is used to generate the partial products. 
These partial products are got in parallel. 

In a 4x4 bit multiplier, two 4-bit binary numbers are multiplied to give an 8-bit output. There are 6 
steps where additions take place to produce the output. These additions can be performed in a pipelined 
manner to compute the final result [15]. It can be understood that the partial outputs are got by addition of the 
sub-partial outputs produced during single bit multiplication. The addition requires adders and the number of 
bits added vary in each step. 


1090999 





The partial products P1 to P7 got in each step are added up in parallel. The result of the multiplier is 
a concatenation of the LSBs got in Step 1 to Step 6 and the partial product got in Step 7 = {P7(1:0) P6(O) 
P5(0) P4(0) P3(O) P2(0) P1(0)}. 
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The same concept can be extended to an 8 x 8 bit multiplier which yields a 16-bit product. There are 
14 steps where additions are required and performing them in a parallel approach leads to great reduction in 
the final result computational time of the multiplier. 


P1 = a0b0 

P2 = aOb! + alb0 

P3 = aOb2 + albl + a2b0+ P2(1) 

P4 = a0b3 + alb2 + a2b1 + a3b0+ P3(1) 

P5 = a0b4 + alb3 + a2b2 + a3b1+ a4b0 + P3(2) + P4(1) 

P6 = a0b5 + alb4 + a2b3 + a3b2+ a4b] + a5b0 + P4(2) + P5(1) 

P7 = a0b6 + alb5 + a2b4 + a3b3+ a4b2 + a5b1 + a6b0 + P5(2) + P6(1) 

P8 = aOb7 + alb6 + a2b5 + a3b4+ a4b3 + a5b2 + a6bl + a7b0 + P6(2) + P7(1) 
P9 = alb7 + a2b6 + a3b5 + a4b4+ a5b3 + a6b2 + a7b1 + P6(3) + P7(2) + P8&(1) 
P10 = a2b7 + a3b6 + a4b5 + a5b4+ a6b3 + a7b2 + P7(3) + P8&(2) + P9(1) 

P11 = a3b7 + a4b6 + a5b5 + a6b4+ a7b3 + P8(3) + P9(2) + P1O(1) 

P12 = a4b7 + a5b6 + a6b5 + a7b4 + P9(3) + P1O0(2) + P11(1) 

P13 = a5b7 + a6b6 + a7b5 + P10(3) + P11(2) + P12(1) 

P14 = a6b7 + a7b6 + P12(2) + P13(1) 

P15 = a7b7 + P13(2) + P14(1) 


The final result of the multiplier = {P15(1:0) P14(0) P13(0) P12(0) P11(0) P10(O) P9(O) P8&(O) P7(0) 
P6(O) P5(0) P4(0) P3(0) P2(0) P1(O)} which is a concatenation of the LSBs got in Step | to Step 14 and the 
partial product got in Step 15. 

Using the same concept, a 16 x 16 bit multiplier is also designed. It has 30 steps containing 
additions and these are performed in a parallel manner. A 32-bit output 1s got by the concatenation of the 
LSBs got in Step 1 to Step 31 and the partial product got in Step 31. The final result of the multiplier = 
{P31(1:0) P30(0) P29(0) P28(0) P27(0) ... P5(0) P4(0) P3(0) P2(0) P1(O)}. 

A 4x4 bit Vedic multiplier is designed to comprise of four 2x2 bit Vedic multipliers. These 2x2 bit 
multipliers employ the pipelining technique for partial product generation and hence they are now pipelined 
multipliers. Thus the 4x4 bit multipliers are also now pipelined. The products are got in parallel, from each of 
the four pipelined 2x 2 bit multipliers. These are partial products and are added in a pipelined manner using 
the three adders to yield the result of the 4x4 bit pipelined multiplier. Partial product accumulation method is 
used in these cases. In the system, lower bit multipliers are used to form higher bit multipliers. The proposed 
system, as shown in Figure 2, thus uses pipelining for partial product generation and accumulation [16]. 


bB:2] afS:2] b[3:2] af1:0] bf1:0] af3:2] bf1:0] af1:0] 


2x 2 bit 2x 2 bit 2x 2 bit 


Pipelined Pipelined Pipelined Pipeli 


elined 
Multiplier Multipher Multipher Multi 






plier 


PP?[3:2] PP1[1:0] PPO[I-( 


P[7:6] P[5:4] P[3:2] P[1:0] 


Figure 2. Proposed pipelined 4x4 bit vedic multiplier employing partial product generation 
and accumulation 


PPO, PPI, PP2 and PP3 are the 4-bit outputs of the four 2x2 bit multipliers. They are the partial 
products and three adders are used for the accumulation of these partial product. The final 8-bit product of 
the 4x4 bit adder is a concatenation of the outputs of these three adders {P(7:6), P(5:4), P(3:2), PU1:0)}, 
where: 
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Step 1 > P(7:6) = PP3(3:2) + Carry of Adder 2 

Step 2 > P(5:4) = PP3(1:0) + PP2(3:2) + PP1(3:2) + Carry of Adder 1 
Step 3 > P(3:2) = PP2(1:0) + PP1(1:0) + PPO(3:2) 

Step 4 > P(1:0) = PP0(1:0) 


Similarly a pipelined 8x8 bit Vedic multiplier is designed to consist of four pipelined 4x4 bit Vedic 
multipliers and a 16x16 bit pipelined multiplier consists of four 8x8 bit pipelined multipliers. The same 
concept of parallel partial product accumulation is also followed in the 8x8 bit and 16x16 bit multipliers. 
Hence the use of parallel addition is used in the design of proposed pipelined Vedic multipliers. 


2.2. Reversible Logic 

Presently computing technologies are shifting to reversible computing and this includes the use of 
reversible logic gates. An m*m logic gate has m inputs and m outputs. Most commonly used reversible gates 
are the Feynman gate, Toffoli gate and Fredkin gate [6, 8]. Feynman is a 2*2 gate while Toffoli and Fredkin 
are 3*3 gates. The inputs to the Toffoli and Fredkin gates are A, B and C and the outputs are P, Q and R. 
Feynman gate has two inputs A and B & outputs P and Q. The equations for these gates are given below. 


Feynman Gate: P=A; 


Q=A@B 
Toffoli Gate: P=A; 

Q=B; 

R=ABQ@C 


Fredkin Gate: P=A; 
Q=B@Q AB@ AC; 
R=C@ABA@AAC 


Toffoli gate is the reversible gate which is most commonly used [17]. The advantage of this gate is 
that when two inputs A & B are given with the third input C kept constant at 0, the output R got is always the 
product of the two inputs given. This makes it suitable for application to the UT sutra and is therefore chosen 
for multiplication in the proposed system. The truth table of Toffoli gate is shown in Table 1. 

Table 2 shows only the output R and can be observed that it is the product of inputs A and B when 
the third input C is 0. This makes the Toffoli gate suitable to be used for 1x1 bit multiplication. Using this 
property, 2x2 bit, 4x4 bit, 8x8 bit and 16x16 bit reversible multipliers are designed for the proposed system. 
Figure 3 (a) gives the circuit representation and (b) the functionality of the Toffoli gate. 


Table 1. Truth Table of Toffoli Gate Table 2. Truth Table of Toffoli Gate with 
Inputs Outputs third input C = 0 
A B Cc P QR B CR 
0 0 0 0 0 0 0 0 0 0 
0 0 1 0 0 1 0 1 0 0 
a. >. ea a 1 0 0 0 
1 1 0 1 
1 0 0 1 0 0 
1 0 1 1 0 1 
1 1 0 1 1 1 
1 1 1 1 1 0 
A P=A 
A P=A 
B TOFFOLI Q=B B Q=B 
GATE 
c R=AB@C Cc R=AB@C 
b 
(a) (b) 


Figure 3. (a) Circuit representation of Toffoli gate (b) Diagram of its functionality 
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A 2-bit multiplier employing the Toffoli gate is shown in Figure 4. x1Xo and yiyo are the two 2-bit 
numbers we have chosen for multiplication. In all the gates, the third input 1s maintained at O, so that the 
output got is the product of the other two inputs to the gate. Poo is the product got by multiplying the LSBs xo 
and yo, Po: is the product of xo and yi, Pio 1s product of x; and yo and Py; 1s the product of x; and yi. 


TOFFOLI TOFFOLI 
GATE GATE 


TOFFOLI TOFFOLI 
GATE GATE 





Figure 4. 2x2-bit binary multiplier using toffoli gate 


Step 1: Pl > Poo 
Step 2: P2 > Pio + Po 
Step 3: P3 > Pi + Carry of P2 
Product of the multiplier: (Carry of P3) (LSB of P3) (LSB of P2) (P1) 

Four of such 2x2 bit reversible pipelined multipliers are used to create a 4x4 bit reversible pipelined 
multiplier. Also 8x8-bit and 16x16 bit reversible pipelined multipliers are designed using four 4x4 bit and 
8x8 bit pipelined multipliers, employing reversible gates for multiplication. 


3. RESULTS AND ANALYSIS 
The delay of the proposed system employing reversible gates is seen to be less than the delays of a 


non-pipelined multiplier and pipelined multiplier employing UT sutra. Figure 5, shows a comparison 
between the delays of the three multiplier systems. 
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Figure 5. Computational delay comparison of different vedic multipliers 


It is observed that the speed of the pipelined Vedic multiplier is greater than its non-pipelined 
version due to the introduction of the pipelining concept in partial product generation and accumulation. 
The use of Toffoli gates into these pipelined Vedic multipliers has reduced the computational delay further, 
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due to the inherent reversible capability of the gate. From Figure 5, it can be noted that the delay for the 
computation of final result, in the 4x4 bit pipelined multiplier has a 17.08% reduction compared to that of the 
4x4 bit non-pipelined counterpart. There is an 8.99% delay reduction in the 8x8 bit pipelined compared to the 
non-pipelined 8x8 bit multiplier. In the case of 16x16 bit multipliers, the delay reduction is 23.40%. 
When the pipelined and reversible pipelined multipliers are compared, there 1s a reduction in delay of 0.21% 
in the 4x4 bit, 10.48% in the 8x8 bit and 11.49% in the 16x16 bit multipliers, as shown in Table 3. 


Table 3. Hardware Utilization Comparison of Different Vedic Multipliers 


Type of Vedic Ax4 bit 8x8 bit 16x16 bit 
Multiplier No. of No. of No. of No. of No. of No. of 
LUTs Slices LUTs Slices LUTs Slices 
Non Pipelined 42 22 201 106 841 445 
Pipelined 33 17 174 98 722 409 
Reversible Pipelined 5 17 171 94 705 390 


The proposed system is implemented on Spartan 3E series of FPGA. The hardware on a Field 
Programmable Gate Array (FPGA) chip is indicated in terms of the number of slices and these slices 
comprise of the LUTs. Depending on the family of the FPGA chip, the number of LUTs on a slice varies. 
Comparing the non-pipelined and pipelined Vedic multipliers, 4x4 bit multipliers show a decrease of 21.42% 
in the number of LUTs and 22.72% in the number of slices used to implement the pipelined design. Similarly 
in 8x8 bit there is a reduction of 13.43% in the LUTs and 7.54% in the slices used. In 16x16 bit, the number 
of LUTs used has decreased by 14.14% and the number of slices by 8.08%. The reversible pipelined Vedic 
multipliers proposed show further reduction in the hardware utilization of FPGA when compared with that of 
pipelined Vedic multipliers. The percent reduction in the number of LUTs in 4x4 bit are 3.03%. In 8x8 bit, 
the decrease in number of LUTs and slices are 1.72% and 4.08%. In 16x16 bit, the percent change 1s 2.35% 
for LUTs and 4.64% for slices. 

In a VLSI design, the important characteristics considered are power, delay and area. For low power 
consumption of the proposed multiplier, the focus was mainly on reducing the computational delay and the 
hardware utilization on the FPGA. It is seen that the proposed system employing the UT sutra and the 
reversible Toffoli gate has greatly increased the throughput of the system and reduced the hardware utilized. 
The UT sutra uses pipelining technique for addition in the partial products generation & accumulation and so 
results in faster computation. Further, the reduction in hardware utilization makes the system low-power 
consuming. 


4. CONCLUSION 

The paper has proposed two multiplier designs; one design employing pipelining in UT sutra and 
the other design which employs reversible logic concept in the pipelined multiplier. Vedic concept combined 
together with pipelining and use of reversible gates in the proposed systems shows great effectiveness in 
reducing the delay in output calculation and the power consumption of the multiplier system. The delays of 
16x16 bit, 8x8 bit and 4x4 bit non-pipelined Vedic multipliers are 51.78 ns, 31.5 ns and 16.97 ns, while the 
delays of the reversible pipelined Vedic multipliers in the same order have reduced to 35.10 ns, 25.54 ns and 
14.04 ns. The LUTs used in 16x16 bit, 8x8 bit and 4x4 bit non-pipelined Vedic multipliers are 42, 201 and 
841 while the same for reversible pipelined Vedic multipliers are 32, 171 and 705. The numbers of slices 
used by 16x16 bit, 8x8 bit and 4x4 bit non-pipelined Vedic multipliers are 22, 106 and 445 while that of 
reversible pipelined Vedic multipliers are 17, 94 and 390. The reduction in the LUTs and slices used result in 
low-power consumption of the proposed multiplier systems. 
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